A flocking based algorithm for document clustering analysis

نویسندگان

  • Xiaohui Cui
  • Jinzhu Gao
  • Thomas E. Potok
چکیده

Social animals or insects in nature often exhibit a form of emergent collective behavior known as flocking. In this paper, we present a novel Flocking based approach for document clustering analysis. Our Flocking clustering algorithm uses stochastic and heuristic principles discovered from observing bird flocks or fish schools. Unlike other partition clustering algorithm such as K-means, the Flocking based algorithm does not require initial partitional seeds. The algorithm generates a clustering of a given set of data through the embedding of the high-dimensional data items on a two-dimensional grid for easy clustering result retrieval and visualization. Inspired by the self-organized behavior of bird flocks, we represent each document object with a flock boid. The simple local rules followed by each flock boid result in the entire document flock generating complex global behaviors, which eventually result in a clustering of the documents. We evaluate the efficiency of our algorithm with both a synthetic dataset and a real document collection that includes 100 news articles collected from the Internet. Our results show that the Flocking clustering algorithm achieves better performance compared to the Kmeans and the Ant clustering algorithm for real document clustering. 2006 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Distributed Agent Implementation of Multiple Species Flocking Model for Document Partitioning Clustering

The Flocking model, first proposed by Craig Reynolds, is one of the first bio-inspired computational collective behavior models that has many popular applications, such as animation. Our early research has resulted in a flock clustering algorithm that can achieve better performance than the Kmeans or the Ant clustering algorithms for data clustering. This algorithm generates a clustering of a g...

متن کامل

A Gravitational Search Algorithm-Based Single-Center of Mass Flocking Control for Tracking Single and Multiple Dynamic Targets for Parabolic Trajectories in Mobile Sensor Networks

Developing optimal flocking control procedure is an essential problem in mobile sensor networks (MSNs). Furthermore, finding the parameters such that the sensors can reach to the target in an appropriate time is an important issue. This paper offers an optimization approach based on metaheuristic methods for flocking control in MSNs to follow a target. We develop a non-differentiable optimizati...

متن کامل

خوشه‌بندی اسناد مبتنی بر آنتولوژی و رویکرد فازی

Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...

متن کامل

Swarm Intelligence in Text Document Clustering

In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. The major challenge of today’s information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize th...

متن کامل

Graphics Processing Unit Enhanced Parallel Document Flocking Clustering

Abstract: Analyzing and clustering documents is a complex problem. One explored method of solving this problem borrows from nature, imitating the flocking behavior of birds. One limitation of this method of document clustering is its complexity O(n). As the number of documents grows, it becomes increasingly difficult to generate results in a reasonable amount of time. In the last few years, the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Systems Architecture

دوره 52  شماره 

صفحات  -

تاریخ انتشار 2006